home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-04-20 | 47.2 KB | 1,179 lines |
-
-
-
-
-
-
- Network Working Group V. Jacobson
- Request for Comments: 1185 LBL
- R. Braden
- ISI
- L. Zhang
- PARC
- October 1990
-
-
- TCP Extension for High-Speed Paths
-
- Status of This Memo
-
- This memo describes an Experimental Protocol extension to TCP for the
- Internet community, and requests discussion and suggestions for
- improvements. Please refer to the current edition of the "IAB
- Official Protocol Standards" for the standardization state and status
- of this protocol. Distribution of this memo is unlimited.
-
- Summary
-
- This memo describes a small extension to TCP to support reliable
- operation over very high-speed paths, using sender timestamps
- transmitted using the TCP Echo option proposed in RFC-1072.
-
- 1. INTRODUCTION
-
- TCP uses positive acknowledgments and retransmissions to provide
- reliable end-to-end delivery over a full-duplex virtual circuit
- called a connection [Postel81]. A connection is defined by its two
- end points; each end point is a "socket", i.e., a (host,port) pair.
- To protect against data corruption, TCP uses an end-to-end checksum.
- Duplication and reordering are handled using a fine-grained sequence
- number space, with each octet receiving a distinct sequence number.
-
- The TCP protocol [Postel81] was designed to operate reliably over
- almost any transmission medium regardless of transmission rate,
- delay, corruption, duplication, or reordering of segments. In
- practice, proper TCP implementations have demonstrated remarkable
- robustness in adapting to a wide range of network characteristics.
- For example, TCP implementations currently adapt to transfer rates in
- the range of 100 bps to 10**7 bps and round-trip delays in the range
- 1 ms to 100 seconds.
-
- However, the introduction of fiber optics is resulting in ever-higher
- transmission speeds, and the fastest paths are moving out of the
- domain for which TCP was originally engineered. This memo and RFC-
- 1072 [Jacobson88] propose modest extensions to TCP to extend the
-
-
-
- Jacobson, Braden & Zhang [Page 1]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- domain of its application to higher speeds.
-
- There is no one-line answer to the question: "How fast can TCP go?".
- The issues are reliability and performance, and these depend upon the
- round-trip delay and the maximum time that segments may be queued in
- the Internet, as well as upon the transmission speed. We must think
- through these relationships very carefully if we are to successfully
- extend TCP's domain.
-
- TCP performance depends not upon the transfer rate itself, but rather
- upon the product of the transfer rate and the round-trip delay. This
- "bandwidth*delay product" measures the amount of data that would
- "fill the pipe"; it is the buffer space required at sender and
- receiver to obtain maximum throughput on the TCP connection over the
- path. RFC-1072 proposed a set of TCP extensions to improve TCP
- efficiency for "LFNs" (long fat networks), i.e., networks with large
- bandwidth*delay products.
-
- On the other hand, high transfer rate can threaten TCP reliability by
- violating the assumptions behind the TCP mechanism for duplicate
- detection and sequencing. The present memo specifies a solution for
- this problem, extending TCP reliability to transfer rates well beyond
- the foreseeable upper limit of bandwidth.
-
- An especially serious kind of error may result from an accidental
- reuse of TCP sequence numbers in data segments. Suppose that an "old
- duplicate segment", e.g., a duplicate data segment that was delayed
- in Internet queues, was delivered to the receiver at the wrong moment
- so that its sequence numbers fell somewhere within the current
- window. There would be no checksum failure to warn of the error, and
- the result could be an undetected corruption of the data. Reception
- of an old duplicate ACK segment at the transmitter could be only
- slightly less serious: it is likely to lock up the connection so that
- no further progress can be made and a RST is required to
- resynchronize the two ends.
-
- Duplication of sequence numbers might happen in either of two ways:
-
- (1) Sequence number wrap-around on the current connection
-
- A TCP sequence number contains 32 bits. At a high enough
- transfer rate, the 32-bit sequence space may be "wrapped"
- (cycled) within the time that a segment may be delayed in
- queues. Section 2 discusses this case and proposes a mechanism
- to reject old duplicates on the current connection.
-
- (2) Segment from an earlier connection incarnation
-
-
-
-
- Jacobson, Braden & Zhang [Page 2]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- Suppose a connection terminates, either by a proper close
- sequence or due to a host crash, and the same connection (i.e.,
- using the same pair of sockets) is immediately reopened. A
- delayed segment from the terminated connection could fall within
- the current window for the new incarnation and be accepted as
- valid. This case is discussed in Section 3.
-
- TCP reliability depends upon the existence of a bound on the lifetime
- of a segment: the "Maximum Segment Lifetime" or MSL. An MSL is
- generally required by any reliable transport protocol, since every
- sequence number field must be finite, and therefore any sequence
- number may eventually be reused. In the Internet protocol suite, the
- MSL bound is enforced by an IP-layer mechanism, the "Time-to-Live" or
- TTL field.
-
- Watson's Delta-T protocol [Watson81] includes network-layer
- mechanisms for precise enforcement of an MSL. In contrast, the IP
- mechanism for MSL enforcement is loosely defined and even more
- loosely implemented in the Internet. Therefore, it is unwise to
- depend upon active enforcement of MSL for TCP connections, and it is
- unrealistic to imagine setting MSL's smaller than the current values
- (e.g., 120 seconds specified for TCP). The timestamp algorithm
- described in the following section gives a way out of this dilemma
- for high-speed networks.
-
-
- 2. SEQUENCE NUMBER WRAP-AROUND
-
- 2.1 Background
-
- Avoiding reuse of sequence numbers within the same connection is
- simple in principle: enforce a segment lifetime shorter than the
- time it takes to cycle the sequence space, whose size is
- effectively 2**31.
-
- More specifically, if the maximum effective bandwidth at which TCP
- is able to transmit over a particular path is B bytes per second,
- then the following constraint must be satisfied for error-free
- operation:
-
- 2**31 / B > MSL (secs) [1]
-
- The following table shows the value for Twrap = 2**31/B in
- seconds, for some important values of the bandwidth B:
-
-
-
-
-
-
-
- Jacobson, Braden & Zhang [Page 3]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- Network B*8 B Twrap
- bits/sec bytes/sec secs
- _______ _______ ______ ______
-
- ARPANET 56kbps 7KBps 3*10**5 (~3.6 days)
-
- DS1 1.5Mbps 190KBps 10**4 (~3 hours)
-
- Ethernet 10Mbps 1.25MBps 1700 (~30 mins)
-
- DS3 45Mbps 5.6MBps 380
-
- FDDI 100Mbps 12.5MBps 170
-
- Gigabit 1Gbps 125MBps 17
-
-
- It is clear why wrap-around of the sequence space was not a
- problem for 56kbps packet switching or even 10Mbps Ethernets. On
- the other hand, at DS3 and FDDI speeds, Twrap is comparable to the
- 2 minute MSL assumed by the TCP specification [Postel81]. Moving
- towards gigabit speeds, Twrap becomes too small for reliable
- enforcement by the Internet TTL mechanism.
-
- The 16-bit window field of TCP limits the effective bandwidth B to
- 2**16/RTT, where RTT is the round-trip time in seconds
- [McKenzie89]. If the RTT is large enough, this limits B to a
- value that meets the constraint [1] for a large MSL value. For
- example, consider a transcontinental backbone with an RTT of 60ms
- (set by the laws of physics). With the bandwidth*delay product
- limited to 64KB by the TCP window size, B is then limited to
- 1.1MBps, no matter how high the theoretical transfer rate of the
- path. This corresponds to cycling the sequence number space in
- Twrap= 2000 secs, which is safe in today's Internet.
-
- Based on this reasoning, an earlier RFC [McKenzie89] has cautioned
- that expanding the TCP window space as proposed in RFC-1072 will
- lead to sequence wrap-around and hence to possible data
- corruption. We believe that this is mis-identifying the culprit,
- which is not the larger window but rather the high bandwidth.
-
- For example, consider a (very large) FDDI LAN with a diameter
- of 10km. Using the speed of light, we can compute the RTT
- across the ring as (2*10**4)/(3*10**8) = 67 microseconds, and
- the delay*bandwidth product is then 833 bytes. A TCP
- connection across this LAN using a window of only 833 bytes
- will run at the full 100mbps and can wrap the sequence space
- in about 3 minutes, very close to the MSL of TCP. Thus, high
-
-
-
- Jacobson, Braden & Zhang [Page 4]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- speed alone can cause a reliability problem with sequence
- number wrap-around, even without extended windows.
-
- An "obvious" fix for the problem of cycling the sequence space is
- to increase the size of the TCP sequence number field. For
- example, the sequence number field (and also the acknowledgment
- field) could be expanded to 64 bits. However, the proposals for
- making such a change while maintaining compatibility with current
- TCP have tended towards complexity and ugliness.
-
- This memo proposes a simple solution to the problem, using the TCP
- echo options defined in RFC-1072. Section 2.2 which follows
- describes the original use of these options to carry timestamps in
- order to measure RTT accurately. Section 2.3 proposes a method of
- using these same timestamps to reject old duplicate segments that
- could corrupt an open TCP connection. Section 3 discusses the
- application of this mechanism to avoiding old duplicates from
- previous incarnations.
-
- 2.2 TCP Timestamps
-
- RFC-1072 defined two TCP options, Echo and Echo Reply. Echo
- carries a 32-bit number, and the receiver of the option must
- return this same value to the source host in an Echo Reply option.
-
- RFC-1072 furthermore describes the use of these options to contain
- 32-bit timestamps, for measuring the RTT. A TCP sending data
- would include Echo options containing the current clock value.
- The receiver would echo these timestamps in returning segments
- (generally, ACK segments). The difference between a timestamp
- from an Echo Reply option and the current time would then measure
- the RTT at the sender.
-
- This mechanism was designed to solve the following problem: almost
- all TCP implementations base their RTT measurements on a sample of
- only one packet per window. If we look at RTT estimation as a
- signal processing problem (which it is), a data signal at some
- frequency (the packet rate) is being sampled at a lower frequency
- (the window rate). Unfortunately, this lower sampling frequency
- violates Nyquist's criteria and may introduce "aliasing" artifacts
- into the estimated RTT [Hamming77].
-
- A good RTT estimator with a conservative retransmission timeout
- calculation can tolerate the aliasing when the sampling frequency
- is "close" to the data frequency. For example, with a window of
- 8 packets, the sample rate is 1/8 the data frequency -- less than
- an order of magnitude different. However, when the window is tens
- or hundreds of packets, the RTT estimator may be seriously in
-
-
-
- Jacobson, Braden & Zhang [Page 5]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- error, resulting in spurious retransmissions.
-
- A solution to the aliasing problem that actually simplifies the
- sender substantially (since the RTT code is typically the single
- biggest protocol cost for TCP) is as follows: the will sender
- place a timestamp in each segment and the receiver will reflect
- these timestamps back in ACK segments. Then a single subtract
- gives the sender an accurate RTT measurement for every ACK segment
- (which will correspond to every other data segment, with a
- sensible receiver). RFC-1072 defined a timestamp echo option for
- this purpose.
-
- It is vitally important to use the timestamp echo option with big
- windows; otherwise, the door is opened to some dangerous
- instabilities due to aliasing. Furthermore, the option is
- probably useful for all TCP's, since it simplifies the sender.
-
- 2.3 Avoiding Old Duplicate Segments
-
- Timestamps carried from sender to receiver in TCP Echo options can
- also be used to prevent data corruption caused by sequence number
- wrap-around, as this section describes.
-
- 2.3.1 Basic Algorithm
-
- Assume that every received TCP segment contains a timestamp.
- The basic idea is that a segment received with a timestamp that
- is earlier than the timestamp of the most recently accepted
- segment can be discarded as an old duplicate. More
- specifically, the following processing is to be performed on
- normal incoming segments:
-
- R1) If the timestamp in the arriving segment timestamp is less
- than the timestamp of the most recently received in-
- sequence segment, treat the arriving segment as not
- acceptable:
-
- If SEG.LEN > 0, send an acknowledgement in reply as
- specified in RFC-793 page 69, and drop the segment;
- otherwise, just silently drop the segment.*
-
- _________________________
- *Sending an ACK segment in reply is not strictly necessary, since the
- case can only arise when a later in-order segment has already been
- received. However, for consistency and simplicity, we suggest
- treating a timestamp failure the same way TCP treats any other
- unacceptable segment.
-
-
-
-
- Jacobson, Braden & Zhang [Page 6]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- R2) If the segment is outside the window, reject it (normal
- TCP processing)
-
- R3) If an arriving segment is in-sequence (i.e, at the left
- window edge), accept it normally and record its timestamp.
-
- R4) Otherwise, treat the segment as a normal in-window, out-
- of-sequence TCP segment (e.g., queue it for later delivery
- to the user).
-
-
- Steps R2-R4 are the normal TCP processing steps specified by
- RFC-793, except that in R3 the latest timestamp is set from
- each in-sequence segment that is accepted. Thus, the latest
- timestamp recorded at the receiver corresponds to the left edge
- of the window and only advances when the left edge moves
- [Jacobson88].
-
- It is important to note that the timestamp is checked only when
- a segment first arrives at the receiver, regardless of whether
- it is in-sequence or is queued. Consider the following
- example.
-
- Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has
- been sent, where the letter indicates the sequence number
- and the digit represents the timestamp. Suppose also that
- segment B.1 has been lost. The highest in-sequence
- timestamp is 1 (from A.1), so C.1, ..., Z.1 are considered
- acceptable and are queued. When B is retransmitted as
- segment B.2 (using the latest timestamp), it fills the
- hole and causes all the segments through Z to be
- acknowledged and passed to the user. The timestamps of
- the queued segments are *not* inspected again at this
- time, since they have already been accepted. When B.2 is
- accepted, the receivers's current timestamp is set to 2.
-
- This rule is vital to allow reasonable performance under loss.
- A full window of data is in transit at all times, and after a
- loss a full window less one packet will show up out-of-sequence
- to be queued at the receiver (e.g., up to ~2**30 bytes of
- data); the timestamp option must not result in discarding this
- data.
-
- In certain unlikely circumstances, the algorithm of rules R1-R4
- could lead to discarding some segments unnecessarily, as shown
- in the following example:
-
- Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have
-
-
-
- Jacobson, Braden & Zhang [Page 7]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- been sent in sequence and that segment B.1 has been lost.
- Furthermore, suppose delivery of some of C.1, ... Z.1 is
- delayed until AFTER the retransmission B.2 arrives at the
- receiver. These delayed segments will be discarded
- unnecessarily when they do arrive, since their timestamps
- are now out of date.
-
- This case is very unlikely to occur. If the retransmission was
- triggered by a timeout, some of the segments C.1, ... Z.1 must
- have been delayed longer than the RTO time. This is presumably
- an unlikely event, or there would be many spurious timeouts and
- retransmissions. If B's retransmission was triggered by the
- "fast retransmit" algorithm, i.e., by duplicate ACK's, then the
- queued segments that caused these ACK's must have been received
- already.
-
- Even if a segment was delayed past the RTO, the selective
- acknowledgment (SACK) facility of RFC-1072 will cause the
- delayed packets to be retransmitted at the same time as B.2,
- avoiding an extra RTT and therefore causing a very small
- performance penalty.
-
- We know of no case with a significant probability of occurrence
- in which timestamps will cause performance degradation by
- unnecessarily discarding segments.
-
- 2.3.2 Header Prediction
-
- "Header prediction" [Jacobson90] is a high-performance
- transport protocol implementation technique that is is most
- important for high-speed links. This technique optimizes the
- code for the most common case: receiving a segment correctly
- and in order. Using header prediction, the receiver asks the
- question, "Is this segment the next in sequence?" This
- question can be answered in fewer machine instructions than the
- question, "Is this segment within the window?"
-
- Adding header prediction to our timestamp procedure leads to
- the following sequence for processing an arriving TCP segment:
-
- H1) Check timestamp (same as step R1 above)
-
- H2) Do header prediction: if segment is next in sequence and
- if there are no special conditions requiring additional
- processing, accept the segment, record its timestamp, and
- skip H3.
-
- H3) Process the segment normally, as specified in RFC-793.
-
-
-
- Jacobson, Braden & Zhang [Page 8]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- This includes dropping segments that are outside the
- window and possibly sending acknowledgments, and queueing
- in-window, out-of-sequence segments.
-
- However, the timestamp check in step H1 is very unlikely to
- fail, and it is a relatively expensive operation since it
- requires interval arithmetic on a finite field. To perform
- this check on every single segment seems like poor
- implementation engineering, defeating the purpose of header
- prediction. Therefore, we suggest that an implementor
- interchange H1 and H2, i.e., perform header prediction FIRST,
- performing H1 and H3 only if header prediction fails. We
- believe that this change might gain 5-10% in performance on
- high-speed networks.
-
- This reordering does raise a theoretical hazard: a segment from
- 2**32 bytes in the past may arrive at exactly the wrong time
- and be accepted mistakenly by the header-prediction step. We
- make the following argument to show that the probability of
- this failure is negligible.
-
- If all segments are equally likely to show up as old
- duplicates, then the probability of an old duplicate
- exactly matching the left window edge is the maximum
- segment size (MSS) divided by the size of the sequence
- space. This ratio must be less than 2**-16, since MSS
- must be < 2**16; for example, it will be (2**12)/(2**32) =
- 2**-20 for an FDDI link. However, the older a segment is,
- the less likely it is to be retained in the Internet, and
- under any reasonable model of segment lifetime the
- probability of an old duplicate exactly at the left window
- edge must be much smaller than 2**16.
-
- The 16 bit TCP checksum also allows a basic unreliability
- of one part in 2**16. A protocol mechanism whose
- reliability exceeds the reliability of the TCP checksum
- should be considered "good enough", i.e., it won't
- contribute significantly to the overall error rate. We
- therefore believe we can ignore the problem of an old
- duplicate being accepted by doing header prediction before
- checking the timestamp.
-
- 2.3.3 Timestamp Frequency
-
- It is important to understand that the receiver algorithm for
- timestamps does not involve clock synchronization with the
- sender. The sender's clock is used to stamp the segments, and
- the sender uses this fact to measure RTT's. However, the
-
-
-
- Jacobson, Braden & Zhang [Page 9]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- receiver treats the timestamp as simply a monotone-increasing
- serial number, without any necessary connection to its clock.
- From the receiver's viewpoint, the timestamp is acting as a
- logical extension of the high-order bits of the sequence
- number.
-
- However, the receiver algorithm dpes place some requirements on
- the frequency of the timestamp "clock":
-
- (a) Timestamp clock must not be "too slow".
-
- It must tick at least once for each 2**31 bytes sent. In
- fact, in order to be useful to the sender for round trip
- timing, the clock should tick at least once per window's
- worth of data, and even with the RFC-1072 window
- extension, 2**31 bytes must be at least two windows.
-
- To make this more quantitative, any clock faster than 1
- tick/sec will reject old duplicate segments for link
- speeds of ~2 Gbps; a 1ms clock will work up to link
- speeds of 2 Tbps (10**12 bps!).
-
- (b) Timestamp clock must not be "too fast".
-
- Its cycling time must be greater than MSL seconds. Since
- the clock (timestamp) is 32 bits and the worst-case MSL is
- 255 seconds, the maximum acceptable clock frequency is one
- tick every 59 ns.
-
- However, since the sender is using the timestamp for RTT
- calculations, the timestamp doesn't need to have much more
- resolution than the granularity of the retransmit timer,
- e.g., tens or hundreds of milliseconds.
-
- Thus, both limits are easily satisfied with a reasonable clock
- rate in the range 1-100ms per tick.
-
- Using the timestamp option relaxes the requirements on MSL for
- avoiding sequence number wrap-around. For example, with a 1 ms
- timestamp clock, the 32-bit timestamp will wrap its sign bit in
- 25 days. Thus, it will reject old duplicates on the same
- connection as long as MSL is 25 days or less. This appears to
- be a very safe figure. If the timestamp has 10 ms resolution,
- the MSL requirement is boosted to 250 days. An MSL of 25 days
- or longer can probably be assumed by the gateway system without
- requiring precise MSL enforcement by the TTL value in the IP
- layer.
-
-
-
-
- Jacobson, Braden & Zhang [Page 10]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- 3. DUPLICATES FROM EARLIER INCARNATIONS OF CONNECTION
-
- We turn now to the second potential cause of old duplicate packet
- errors: packets from an earlier incarnation of the same connection.
- The appendix contains a review the mechanisms currently included in
- TCP to handle this problem. These mechanisms depend upon the
- enforcement of a maximum segment lifetime (MSL) by the Internet
- layer.
-
- The MSL required to prevent failures due to an earlier connection
- incarnation does not depend (directly) upon the transfer rate.
- However, the timestamp option used as described in Section 2 can
- provide additional security against old duplicates from earlier
- connections. Furthermore, we will see that with the universal use of
- the timestamp option, enforcement of a maximum segment lifetime would
- no longer be required for reliable TCP operation.
-
- There are two cases to be considered (see the appendix for more
- explanation): (1) a system crashing (and losing connection state)
- and restarting, and (2) the same connection being closed and reopened
- without a loss of host state. These will be described in the
- following two sections.
-
- 3.1 System Crash with Loss of State
-
- TCP's quiet time of one MSL upon system startup handles the loss
- of connection state in a system crash/restart. For an
- explanation, see for example "When to Keep Quiet" in the TCP
- protocol specification [Postel81]. The MSL that is required here
- does not depend upon the transfer speed. The current TCP MSL of 2
- minutes seems acceptable as an operational compromise, as many
- host systems take this long to boot after a crash.
-
- However, the timestamp option may be used to ease the MSL
- requirements (or to provide additional security against data
- corruption). If timestamps are being used and if the timestamp
- clock can be guaranteed to be monotonic over a system
- crash/restart, i.e., if the first value of the sender's timestamp
- clock after a crash/restart can be guaranteed to be greater than
- the last value before the restart, then a quiet time will be
- unnecessary.
-
- To dispense totally with the quiet time would seem to require that
- the host clock be synchronized to a time source that is stable
- over the crash/restart period, with an accuracy of one timestamp
- clock tick or better. Fortunately, we can back off from this
- strict requirement. Suppose that the clock is always re-
- synchronized to within N timestamp clock ticks and that booting
-
-
-
- Jacobson, Braden & Zhang [Page 11]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- (extended with a quiet time, if necessary) takes more than N
- ticks. This will guarantee monotonicity of the timestamps, which
- can then be used to reject old duplicates even without an enforced
- MSL.
-
- 3.2 Closing and Reopening a Connection
-
- When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT
- state ties up the socket pair for 4 minutes (see Section 3.5 of
- [Postel81]. Applications built upon TCP that close one connection
- and open a new one (e.g., an FTP data transfer connection using
- Stream mode) must choose a new socket pair each time. This delay
- serves two different purposes:
-
- (a) Implement the full-duplex reliable close handshake of TCP.
-
- The proper time to delay the final close step is not really
- related to the MSL; it depends instead upon the RTO for the
- FIN segments and therefore upon the RTT of the path.*
- Although there is no formal upper-bound on RTT, common
- network engineering practice makes an RTT greater than 1
- minute very unlikely. Thus, the 4 minute delay in TIME-WAIT
- state works satisfactorily to provide a reliable full-duplex
- TCP close. Note again that this is independent of MSL
- enforcement and network speed.
-
- The TIME-WAIT state could cause an indirect performance
- problem if an application needed to repeatedly close one
- connection and open another at a very high frequency, since
- the number of available TCP ports on a host is less than
- 2**16. However, high network speeds are not the major
- contributor to this problem; the RTT is the limiting factor
- in how quickly connections can be opened and closed.
- Therefore, this problem will no worse at high transfer
- speeds.
-
- (b) Allow old duplicate segements to expire.
-
- Suppose that a host keeps a cache of the last timestamp
- received from each remote host. This can be used to reject
- old duplicate segments from earlier incarnations of the
- _________________________
- *Note: It could be argued that the side that is sending a FIN knows
- what degree of reliability it needs, and therefore it should be able
- to determine the length of the TIME-WAIT delay for the FIN's
- recipient. This could be accomplished with an appropriate TCP option
- in FIN segments.
-
-
-
-
- Jacobson, Braden & Zhang [Page 12]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- connection, if the timestamp clock can be guaranteed to have
- ticked at least once since the old conennection was open.
- This requires that the TIME-WAIT delay plus the RTT together
- must be at least one tick of the sender's timestamp clock.
-
- Note that this is a variant on the mechanism proposed by
- Garlick, Rom, and Postel (see the appendix), which required
- each host to maintain connection records containing the
- highest sequence numbers on every connection. Using
- timestamps instead, it is only necessary to keep one quantity
- per remote host, regardless of the number of simultaneous
- connections to that host.
-
- We conclude that if all hosts used the TCP timestamp algorithm
- described in Section 2, enforcement of a maximum segment lifetime
- would be unnecessary and the quiet time at system startup could be
- shortened or removed. In any case, the timestamp mechanism can
- provide additional security against old duplicates from earlier
- connection incarnations. However, a 4 minute TIME-WAIT delay
- (unrelated to MSL enforcement or network speed) must be retained
- to provide the reliable close handshake of TCP.
-
- 4. CONCLUSIONS
-
- We have presented a mechanism, based upon the TCP timestamp echo
- option of RFC-1072, that will allow very high TCP transfer rates
- without reliability problems due to old duplicate segments on the
- same connection. This mechanism also provides additional security
- against intrusion of old duplicates from earlier incarnations of the
- same connection. If the timestamp mechanism were used by all hosts,
- the quiet time at system startup could be eliminated and enforcement
- of a maximum segment lifetime (MSL) would no longer be necessary.
-
- REFERENCES
-
- [Cerf76] Cerf, V., "TCP Resynchronization", Tech Note #79, Digital
- Systems Lab, Stanford, January 1976.
-
- [Dalal74] Dalal, Y., "More on Selecting Sequence Numbers", INWG
- Protocol Note #4, October 1974.
-
- [Garlick77] Garlick, L., R. Rom, and J. Postel, "Issues in Reliable
- Host-to-Host Protocols", Proc. Second Berkeley Workshop on
- Distributed Data Management and Computer Networks, May 1977.
-
- [Hamming77] Hamming, R., "Digital Filters", ISBN 0-13-212571-4,
- Prentice Hall, Englewood Cliffs, N.J., 1977.
-
-
-
-
- Jacobson, Braden & Zhang [Page 13]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- [Jacobson88] Jacobson, V., and R. Braden, "TCP Extensions for
- Long-Delay Paths", RFC 1072, LBL and USC/Information Sciences
- Institute, October 1988.
-
- [Jacobson90] Jacobson, V., "4BSD Header Prediction", ACM Computer
- Communication Review, April 1990.
-
- [McKenzie89] McKenzie, A., "A Problem with the TCP Big Window
- Option", RFC 1110, BBN STC, August 1989.
-
- [Postel81] Postel, J., "Transmission Control Protocol", RFC 793,
- DARPA, September 1981.
-
- [Tomlinson74] Tomlinson, R., "Selecting Sequence Numbers", INWG
- Protocol Note #2, September 1974.
-
- [Watson81] Watson, R., "Timer-based Mechanisms in Reliable
- Transport Protocol Connection Management", Computer Networks,
- Vol. 5, 1981.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Jacobson, Braden & Zhang [Page 14]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- APPENDIX -- Protection against Old Duplicates in TCP
-
- During the development of TCP, a great deal of effort was devoted to
- the problem of protecting a TCP connection from segments left from
- earlier incarnations of the same connection. Several different
- mechanisms were proposed for this purpose [Tomlinson74] [Dalal74]
- [Cerf76] [Garlick77].
-
- The connection parameters that are required in this discussion are:
-
- Tc = Connection duration in seconds.
-
- Nc = Total number of bytes sent on connection.
-
- B = Effective bandwidth of connection = Nc/Tc.
-
- Tomlinson proposed a scheme with two parts: a clock-driven selection
- of ISN (Initial Sequence Number) for a connection, and a
- resynchronization procedure [Tomlinson74]. The clock-driven scheme
- chooses:
-
- ISN = (integer(R*t)) mod 2**32 [2]
-
- where t is the current time relative to an arbitrary origin, and R is
- a constant. R was intended to be chosen so that ISN will advance
- faster than sequence numbers will be used up on the connection.
- However, at high speeds this will not be true; the consequences of
- this will be discussed below.
-
- The clock-driven choice of ISN in formula [2] guarantees freedom from
- old duplicates matching a reopened connection if the original
- connection was "short-lived" and "slow". By "short-lived", we mean a
- connection that stayed open for a time Tc less than the time to cycle
- the ISN, i.e., Tc < 2**32/R seconds. By "slow", we mean that the
- effective transfer rate B is less than R.
-
- This is illustrated in Figure 1, where sequence numbers are plotted
- against time. The asterisks show the ISN lines from formula [2],
- while the circles represent the trajectories of several short-lived
- incarnations of the same connection, each terminating at the "x".
-
- Note: allowing rapid reuse of connections was believed to be an
- important goal during the early TCP development. This
- requirement was driven by the hope that TCP would serve as a
- basis for user-level transaction protocols as well as
- connection-oriented protocols. The paradigm discussed was the
- "Christmas Tree" or "Kamikazee" segment that contained SYN and
- FIN bits as well as data. Enthusiasm for this was somewhat
-
-
-
- Jacobson, Braden & Zhang [Page 15]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- dampened when it was observed that the 3-way SYN handshake and
- the FIN handshake mean that 5 packets are required for a minimum
- exchange. Furthermore, the TIME-WAIT state delay implies that
- the same connection really cannot be reopened immediately. No
- further work has been done in this area, although existing
- applications (especially SMTP) often generate very short TCP
- sessions. The reuse problem is generally avoided by using a
- different port pair for each connection.
-
-
- |- 2**32 ISN ISN
- | * *
- | * *
- | * *
- | *x *
- | o *
- ^ | * *
- | | * x *
- | * o *
- S | *o *
- e | o *
- q | * *
- | * *
- # | * x *
- | *o *
- |o_______________*____________
- ^ Time -->
- 4.55hrs
-
-
- Figure 1. Clock-Driven ISN avoiding duplication on
- short-Lived, slow connections.
-
-
- However, clock-driven ISN selection does not protect against old
- duplicate packets for a long-lived or fast connection: the
- connection may close (or crash) just as the ISN has cycled around and
- reached the same value again. If the connection is then reopened, a
- datagram still in transit from the old connection may fall into the
- current window. This is illustrated by Figure 2 for a slow, long-
- lived connection, and by Figures 3 and 4 for fast connections. In
- each case, the point "x" marks the place at which the original
- connection closes or crashes. The arrow in Figure 2 illustrates an
- old duplicate segment. Figure 3 shows a connection whose total byte
- count Nc < 2**32, while Figure 4 concerns Nc >= 2**32.
-
- To prevent the duplication illustrated in Figure 2, Tomlinson
- proposed to "resynchronize" the connection sequence numbers if they
-
-
-
- Jacobson, Braden & Zhang [Page 16]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- came within an MSL of the ISN. Resynchronization might take the form
- of a delay (point "y") or the choice of a new sequence number (point
- "z").
-
- |- 2**32 ISN ISN
- | * *
- | * *
- | * *
- | * *
- | * *
- ^ | * *
- | | * *
- | * *
- S | * *
- e | * x* y
- q | * o *
- | * o *z
- # | *o *
- | * *
- |*_________________*____________
- ^ Time -->
- 4.55hrs
-
- Figure 2. Resynchronization to Avoid Duplication
- on Slow, Long-Lived Connection
-
-
-
- |- 2**32 ISN ISN
- | * *
- | x o * *
- | * *
- | o-->o* *
- | * *
- ^ | o o *
- | | * *
- | o * *
- S | * *
- e | o * *
- q | * *
- | o* *
- # | * *
- | o *
- |*_________________*____________
- ^ Time -->
- 4.55hrs
-
- Figure 3. Duplication on Fast Connection: Nc < 2**32 bytes
-
-
-
- Jacobson, Braden & Zhang [Page 17]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- |- 2**32 ISN ISN
- | o * *
- | x * *
- | * *
- | o * *
- | o *
- ^ | * *
- | | o * *
- | * o *
- S | * *
- e | o * *
- q | * o *
- | * *
- # | o *
- | * o *
- |*_________________*____________
- ^ Time -->
- 4.55hrs
-
- Figure 4. Duplication on Fast Connection: Nc > 2**32 bytes
-
- In summary, Figures 1-4 illustrated four possible failure modes for
- old duplicate packets from an earlier incarnation. We will call
- these four modes F1 , F2, F3, and F4:
-
-
- F1: B < R, Tc < 4.55 hrs. (Figure 1)
-
- F2: B < R, Tc >= 4.55 hrs. (Figure 2)
-
- F3: B >= R, Nc < 2**32 (Figure 3)
-
- F4: B >= R, Nc >= 2**32 (Figure 4)
-
-
- Another limitation of clock-driven ISN selection should be mentioned.
- Tomlinson assumed that the current time t in formula [2] is obtained
- from a clock that is persistent over a system crash. For his scheme
- to work correctly, the clock must be restarted with an accuracy of
- 1/R seconds (e.g, 4 microseconds in the case of TCP). While this may
- be possible for some hosts and some crashes, in most cases there will
- be an uncertainty in the clock after a crash that ranges from a
- second to several minutes.
-
- As a result of this random clock offset after system
- reinitialization, there is a possibility that old segments sent
- before the crash may fall into the window of a new connection
- incarnation. The solution to this problem that was adopted in the
-
-
-
- Jacobson, Braden & Zhang [Page 18]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- final TCP spec is a "quiet time" of MSL seconds when the system is
- initialized [Postel81, p. 28]. No TCP connection can be opened until
- the expiration of this quiet time.
-
- A different approach was suggested by Garlick, Rom, and Postel
- [Garlick77]. Rather than using clock-driven ISN selection, they
- proposed to maintain connection records containing the last ISN used
- on every connection. To immediately open a new incarnation of a
- connection, the ISN is taken to be greater than the last sequence
- number of the previous incarnation, so that the new incarnation will
- have unique sequence numbers. To handle a system crash, they
- proposed a quiet time, i.e., a delay at system startup time to allow
- old duplicates to expire. Note that the connection records need be
- kept only for MSL seconds; after that, no collision is possible, and
- a new connection can start with sequence number zero.
-
- The scheme finally adopted for TCP combines features of both these
- proposals. TCP uses three mechanisms:
-
- (A) ISN selection is clock-driven to handle short-lived connections.
- The parameter R = 250KBps, so that the ISN value cycles in
- 2**32/R = 4.55 hours.
-
- (B) (One end of) a closed connection is left in a "busy" state,
- known as "TIME-WAIT" state, for a time of 2*MSL. TIME-WAIT
- state handles the proper close of a long-lived connection
- without resynchronization. It also allows reliable completion
- of the full-duplex close handshake.
-
- (C) There is a quiet time of one MSL at system startup. This
- handles a crash of a long-lived connection and avoids time
- resynchronization problems in (A).
-
- Notice that (B) and (C) together are logically sufficient to prevent
- accidental reuse of sequence numbers from a different incarnation,
- for any of the failure modes F1-F4. (A) is not logically necessary
- since the close delay (B) makes it impossible to reopen the same TCP
- connection immediately. However, the use of (A) does give additional
- assurance in a common case, perhaps compensating for a host that has
- set its TIME-WAIT state delay too short.
-
- Some TCP implementations have permitted a connection in the TIME-WAIT
- state to be reopened immediately by the other side, thus short-
- circuiting mechanism (B). Specifically, a new SYN for the same
- socket pair is accepted when the earlier incarnation is still in
- TIME-WAIT state. Old duplicates in one direction can be avoided by
- choosing the ISN to be the next unused sequence number from the
- preceding connection (i.e., FIN+1); this is essentially an
-
-
-
- Jacobson, Braden & Zhang [Page 19]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- application of the scheme of Garlick, Rom, and Postel, using the
- connection block in TIME-WAIT state as the connection record.
-
- However, the connection is still vulnerable to old duplicates in the
- other direction. Mechanism (A) prevents trouble in mode F1, but
- failures can arise in F2, F3, or F4; of these, F2, on short, fast
- connections, is the most dangerous.
-
- Finally, we note TCP will operate reliably without any MSL-based
- mechanisms in the following restricted domain:
-
- * Total data sent is less then 2**32 octets, and
-
- * Effective sustained rate less than 250KBps, and
-
- * Connection duration less than 4.55 hours.
-
- At the present time, the great majority of current TCP usage falls
- into this restricted domain. The third component, connection
- duration, is the most commonly violated.
-
- Security Considerations
-
- Security issues are not discussed in this memo.
-
- Authors' Addresses
-
- Van Jacobson
- University of California
- Lawrence Berkeley Laboratory
- Mail Stop 46A
- Berkeley, CA 94720
-
- Phone: (415) 486-6411
- EMail: van@CSAM.LBL.GOV
-
-
- Bob Braden
- University of Southern California
- Information Sciences Institute
- 4676 Admiralty Way
- Marina del Rey, CA 90292
-
- Phone: (213) 822-1511
- EMail: Braden@ISI.EDU
-
-
-
-
-
-
- Jacobson, Braden & Zhang [Page 20]
-
- RFC 1185 TCP over High-Speed Paths October 1990
-
-
- Lixia Zhang
- XEROX Palo Alto Research Center
- 3333 Coyote Hill Road
- Palo Alto, CA 94304
-
- Phone: (415) 494-4415
- EMail: lixia@PARC.XEROX.COM
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Jacobson, Braden & Zhang [Page 21]
-